Goto

Collaborating Authors

 rna structure


RiboFlow: Conditional De Novo RNACo-Design via Synergistic Flow Matching

Neural Information Processing Systems

Ribonucleic acid (RNA) binds to molecules to achieve specific biological functions. While generative models are advancing biomolecule design, existing methods for designing RNA that target specific ligands face limitations in capturing RNA's conformational flexibility, ensuring structural validity, and overcoming data scarcity. To address these challenges, we introduce RiboFlow, a synergistic flow matching model to co-design RNA structures and sequences based on target molecules. By integrating RNA backbone frames, torsion angles, and sequence features in an unified architecture, RiboFlow explicitly models RNA's dynamic conformations while enforcing sequence-structure consistency to improve validity. Additionally, we curate RiboBind, a large-scale dataset of RNA-molecule interactions, to resolve the scarcity of high-quality structural data. Extensive experiments reveal that RiboFlow not only outperforms state-of-the-art RNA design methods by a large margin but also showcases controllable capabilities for achieving high binding affinity to target ligands.


Geometric Algebra-Enhanced Bayesian Flow Network for RNAInverse Design

Neural Information Processing Systems

With the development of biotechnology, RNA therapies have shown great potential. However, different from proteins, the sequences corresponding to a single RNA three-dimensional structure are more abundant. Most of the existing RNA design methods merely take into account the secondary structure of RNA, or are only capable of generating a limited number of candidate sequences. To address these limitations, we propose a geometric-algebra-enhanced Bayesian Flow Network for the inverse design of RNA, called RBFN. RBFN uses a Bayesian Flow Network to model the distribution of nucleotide sequences in RNA, enabling the generation of more reasonable RNA sequences. Meanwhile, considering the more flexible characteristics of RNA conformations, we utilize geometric algebra to enhance the modeling ability of the RNA three-dimensional structure, facilitating a better understanding of RNA structural properties. In addition, due to the scarcity of RNA structures and the limitation that there are only four types of nucleic acids, we propose a new time-step distribution sampling to address the scarcity of RNA structure data and the relatively small number of nucleic acid types. Evaluation on the single-state fixed-backbone re-design benchmark and multi-state fixedbackbone benchmark indicates that RBFN can outperform existing RNA design methods in various RNA design tasks, enabling effective RNA sequence design.


BeeRNA: tertiary structure-based RNA inverse folding using Artificial Bee Colony

arXiv.org Artificial Intelligence

The Ribonucleic Acid (RNA) inverse folding problem, designing nucleotide sequences that fold into specific tertiary structures, is a fundamental computational biology problem with important applications in synthetic biology and bioengineering. The design of complex three-dimensional RNA architectures remains computationally demanding and mostly unresolved, as most existing approaches focus on secondary structures. In order to address tertiary RNA inverse folding, we present BeeRNA, a bio-inspired method that employs the Artificial Bee Colony (ABC) optimization algorithm. Our approach combines base-pair distance filtering with RMSD-based structural assessment using RhoFold for structure prediction, resulting in a two-stage fitness evaluation strategy. To guarantee biologically plausible sequences with balanced GC content, the algorithm takes thermodynamic constraints and adaptive mutation rates into consideration. In this work, we focus primarily on short and medium-length RNAs ($<$ 100 nucleotides), a biologically significant regime that includes microRNAs (miRNAs), aptamers, and ribozymes, where BeeRNA achieves high structural fidelity with practical CPU runtimes. The lightweight, training-free implementation will be publicly released for reproducibility, offering a promising bio-inspired approach for RNA design in therapeutics and biotechnology.


A Comprehensive Benchmark for RNA 3D Structure-Function Modeling

arXiv.org Machine Learning

The RNA structure-function relationship has recently garnered significant attention within the deep learning community, promising to grow in importance as nucleic acid structure models advance. However, the absence of standardized and accessible benchmarks for deep learning on RNA 3D structures has impeded the development of models for RNA functional characteristics. In this work, we introduce a set of seven benchmarking datasets for RNA structure-function prediction, designed to address this gap. Our library builds on the established Python library rnaglib, and offers easy data distribution and encoding, splitters and evaluation methods, providing a convenient all-in-one framework for comparing models. Datasets are implemented in a fully modular and reproducible manner, facilitating for community contributions and customization. Finally, we provide initial baseline results for all tasks using a graph neural network. Source code: https://github.com/cgoliver/rnaglib Documentation: https://rnaglib.org


RiboGen: RNA Sequence and Structure Co-Generation with Equivariant MultiFlow

arXiv.org Artificial Intelligence

Ribonucleic acid (RNA) plays fundamental roles in biological systems, from carrying genetic information to performing enzymatic function. Understanding and designing RNA can enable novel therapeutic application and biotechnological innovation. To enhance RNA design, in this paper we introduce RiboGen, the first deep learning model to simultaneously generate RNA sequence and all-atom 3D structure. RiboGen leverages the standard Flow Matching with Discrete Flow Matching in a multimodal data representation. RiboGen is based on Euclidean Equivariant neural networks for efficiently processing and learning three-dimensional geometry. Our experiments show that RiboGen can efficiently generate chemically plausible and self-consistent RNA samples. Our results suggest that co-generation of sequence and structure is a competitive approach for modeling RNA.


Accurate RNA 3D structure prediction using a language model-based deep learning approach

arXiv.org Artificial Intelligence

Accurate prediction of RNA three-dimensional (3D) structure remains an unsolved challenge. Determining RNA 3D structures is crucial for understanding their functions and informing RNA-targeting drug development and synthetic biology design. The structural flexibility of RNA, which leads to scarcity of experimentally determined data, complicates computational prediction efforts. Here, we present RhoFold+, an RNA language model-based deep learning method that accurately predicts 3D structures of single-chain RNAs from sequences. By integrating an RNA language model pre-trained on ~23.7 million RNA sequences and leveraging techniques to address data scarcity, RhoFold+ offers a fully automated end-to-end pipeline for RNA 3D structure prediction. Retrospective evaluations on RNA-Puzzles and CASP15 natural RNA targets demonstrate RhoFold+'s superiority over existing methods, including human expert groups. Its efficacy and generalizability are further validated through cross-family and cross-type assessments, as well as time-censored benchmarks. Additionally, RhoFold+ predicts RNA secondary structures and inter-helical angles, providing empirically verifiable features that broaden its applicability to RNA structure and function studies.


RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design

arXiv.org Artificial Intelligence

We introduce RNA-FrameFlow, the first generative model for 3D RNA backbone design. We build upon SE(3) flow matching for protein backbone generation and establish protocols for data preparation and evaluation to address unique challenges posed by RNA modeling. We formulate RNA structures as a set of rigid-body frames and associated loss functions which account for larger, more conformationally flexible RNA backbones (13 atoms per nucleotide) vs. proteins (4 atoms per residue). Toward tackling the lack of diversity in 3D RNA datasets, we explore training with structural clustering and cropping augmentations. Additionally, we define a suite of evaluation metrics to measure whether the generated RNA structures are globally self-consistent (via inverse folding followed by forward folding) and locally recover RNA-specific structural descriptors. The most performant version of RNA-FrameFlow generates locally realistic RNA backbones of 40-150 nucleotides, over 40% of which pass our validity criteria as measured by a self-consistency TM-score >= 0.45, at which two RNAs have the same global fold. Open-source code: https://github.com/rish-16/rna-backbone-design


3D-based RNA function prediction tools in rnaglib

arXiv.org Artificial Intelligence

Understanding the connection between complex structural features of RNA and biological function is a fundamental challenge in evolutionary studies and in RNA design. However, building datasets of RNA 3D structures and making appropriate modeling choices remains time-consuming and lacks standardization. In this chapter, we describe the use of rnaglib, to train supervised and unsupervised machine learning-based function prediction models on datasets of RNA 3D structures.


#ICML2023 invited talk: Jennifer Doudna on machine learning for biological research

AIHub

The programme of the International Conference on Machine Learning (ICML) featured an invited talk by Jennifer Doudna entitled "The future of ML in biology: CRISPR for health and climate". Jennifer Doudna and Emmanuelle Charpentier won the 2020 Nobel Prize in Chemistry for "the development of a method for genome editing". The method in question is often referred to as CRISPR/Cas9 genetic scissors. Using this technique, researchers can change the DNA of animals, plants and microorganisms with extremely high precision. This technology has already had a huge impact on the biological sciences.


Physics-aware Graph Neural Network for Accurate RNA 3D Structure Prediction

arXiv.org Artificial Intelligence

Biological functions of RNAs are determined by their three-dimensional (3D) structures. Thus, given the limited number of experimentally determined RNA structures, the prediction of RNA structures will facilitate elucidating RNA functions and RNA-targeted drug discovery, but remains a challenging task. In this work, we propose a Graph Neural Network (GNN)-based scoring function trained only with the atomic types and coordinates on limited solved RNA 3D structures for distinguishing accurate structural models. The proposed Physics-aware Multiplex Graph Neural Network (PaxNet) separately models the local and non-local interactions inspired by molecular mechanics. Furthermore, PaxNet contains an attention-based fusion module that learns the individual contribution of each interaction type for the final prediction. We rigorously evaluate the performance of PaxNet on two benchmarks and compare it with several state-of-the-art baselines. The results show that PaxNet significantly outperforms all the baselines overall, and demonstrate the potential of PaxNet for improving the 3D structure modeling of RNA and other macromolecules. Our code is available at https://github.com/zetayue/Physics-aware-Multiplex-GNN.